A Word Similarity Algorithm with Sememe Probability Density Ratio Based on HowNet
نویسندگان
چکیده
The study on word similarity computation plays an important role in natural language processing (NLP). Recently the algorithm based on HowNet is widely used and proves to work well in Chinese word similarity computation. However, the relationship between the number of brother nodes and the fineness of the hierarchy is not considered. This paper investigates the ratio of two words on the brother nodes’ number called sememe probability density and proposes an improved algorithm based on HowNet. The results indicate that the correlation measure of the algorithm presented by this paper is 75.4%, and it is much better than the major state-of-the-art method (68.1%).
منابع مشابه
Chinese HowNet-Based Multi-factor Word Similarity Algorithm Integrated of Result Modification
In this paper, we firstly describe a novel approach to calculate the Chinese sememe similarity based on the HowNet hierarchical sememe tree. When we calculate the sememe similarity, we not only take Semantic Distance, Node Depth and Semantic Coincidence Degree into consideration, but also propose two impact factors named Node Environment Dense (NED) and Node Layer Ratio (NLR) to optimize the ca...
متن کاملWord Semantic Similarity Calculation Based on Domain Knowledge and HowNet
Word semantic similarity is the foundation of semantic processing, and is a key issue in many applications. This paper argues that word semantic similarity should associate with domain knowledge, which traditional methods did not take into account. In order to adopt domain knowledge into semantic similarity measurement, this paper proposed a sensitive words sets approach. For this purpose, we a...
متن کاملChinese Word Sense Disambiguation with PageRank and HowNet
Word sense disambiguation is a basic problem in natural language processing. This paper proposed an unsupervised word sense disambiguation method based PageRank and HowNet. In the method, a free text is firstly represented as a sememe graph with sememes as vertices and relatedness of sememes as weighted edges based on HowNet. Then UW-PageRank is applied on the sememe graph to score the importan...
متن کاملThe Research of Chinese Words Semantic Similarity Calculation with Multi-Information
Text similarity has a relatively wide range of applications in many fields, such as intelligent information retrieval, question answering system, text rechecking, machine translation, and so on. The text similarity computing based on the meaning has been used more widely in the similarity computing of the words and phrase. Using the knowledge structure of the and its method of knowledg...
متن کاملLexical Sememe Prediction via Word Embeddings and Matrix Factorization
Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings...
متن کامل